Exploratory Data Analysis of WDI Dataset

Amanda Yang

2024-10-09

Key Indicators Overview

In this analysis, we explore three key indicators from the WDI dataset:

  1. GDP per Capita
  2. Life Expectancy
  3. Unemployment Rate

Summary Statistics

Below are the summary statistics for the selected indicators.

import pandas as pd
df = pd.read_csv('wdi.csv')
df[['gdp_per_capita', 'life_expectancy', 'unemployment_rate']].describe()
gdp_per_capita life_expectancy unemployment_rate
count 203.000000 209.000000 186.000000
mean 20345.707649 72.416519 7.268661
std 31308.942225 7.713322 5.827726
min 259.025031 52.997000 0.130000
25% 2570.563284 66.782000 3.500750
50% 7587.588173 73.514634 5.537500
75% 25982.630050 78.475000 9.455250
max 240862.182448 85.377000 37.852000

Key Takeaways of Summary Statistics

  • The GDP per capita has a wide range, from $259 to over $240,000, with an average of $20,345 and a high standard deviation of $31,309, indicating large economic inequality between countries
  • Life expectancy is relatively more consistent, with an average of 72.42 years and a narrower range between 53 and 85 years. The median life expectancy is 73.51 years, suggesting that most countries fall within a typical range for longevity
  • The unemployment rate shows significant variation, with a mean of 7.27% and a standard deviation of 5.83%. The lowest unemployment rate is 0.13%, while the highest reaches 37.85%, reflecting considerable differences in l abor market conditions

Visualization of Data

GDP per Capita

Bar chart: GDP per capita for all countries.

import matplotlib.pyplot as plt
import seaborn as sns

# Plot distribution of GDP per capita
plt.figure(figsize=(8, 5))
sns.histplot(df['gdp_per_capita'].dropna(), kde=True, bins=30, color='blue')
plt.title('Distribution of GDP per Capita')
plt.xlabel('GDP per Capita')
plt.ylabel('Frequency')
plt.show()

Interpretation of GDP per Capita

The distribution of GDP per capita is heavily skewed to the right, with most countries having a relatively low GDP per capita. A few countries have very high GDP per capita values, as seen from the long tail of the distribution.

Life Expectancy vs. GDP per Capita

Scatterplot: the relationship between life expectancy and GDP per capita.

# Scatter plot between life expectancy and GDP per Capita
plt.figure(figsize=(8, 5))
sns.scatterplot(data=df, x='gdp_per_capita', y='life_expectancy', hue='country', legend=False)
plt.title('Life Expectancy vs. GDP per Capita')
plt.xlabel('GDP per Capita')
plt.ylabel('Life Expectancy')
plt.show()

Interpretation of Life Expectancy vs. GDP per Capita

There is a positive correlation between life expectancy and GDP per capita. Countries with higher GDP per capita tend to have higher life expectancies. However, this relationship flattens as GDP per capita increases, suggesting diminishing returns on life expectancy beyond a certain level of GDP per capita, corresponding to the life expectancy level among developing states in the world today.

Unemployment

Barplot: the distribution of the unemployment rate of all countries.

# Plot distribution of unemployment rate
plt.figure(figsize=(8, 5))
sns.histplot(df['unemployment_rate'].dropna(), kde=True, bins=30, color='green')
plt.title('Distribution of Unemployment Rate')
plt.xlabel('Unemployment Rate (%)')
plt.ylabel('Frequency')
plt.show()

Interpretation of Unemployment

The unemployment rate distribution shows that most countries have an unemployment rate between 0% and 15%. A smaller number of countries have higher unemployment rates, with a few exceeding 25%. The data is right-skewed.

Conclusion

  • The analysis shows large disparities in GDP per capita across countries.
  • Life expectancy correlates positively with GDP per capita.
  • Unemployment rates vary significantly across countries.

Thank You!